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• Abstract 

^ In this paper the results of a simulation of a prisoner's dilemma robin- 

I I round tournament are presented. In the tournament each participating 

strategy plays an iterated prisoner's dilemma against each other strategy 
(round-robin) and as a variant also against itself. The participants of a 
tournament are all strategies that are deterministic and have the same size 
of memory with regard to their own and their opponent's past actions: up 
^^Jl^ to three most recent actions of their opponent and up to two most recent 

actions of their own. A focus is set on the investigation of the influence of 

• the number of iterations, details of the payoff matrix, and the influence of 
' — ^ memory size. The main result is that for the tournament as carried out 
l^"^ here, different strategies emerge as winners for different payoff matrices, 

I even for different payoff matrices being similar judged on if they fulfill 

• . relations T + S = P + R oi 2R > T + S. As a consequence of this result 

^ it is suggested that whenever the iterated prisoner's dilemma is used to 

model a real system that does not explicitly fix the payoff matrix, one 
should check if conclusions remain valid, when a different payoff matrix is 



> 

o 



X 

used. 



1 Introduction and Motivation 

The prisoner's dilemma [1, 2] is probably the most prominent and most discussed 
example from game theory, which is a result of it standing as the model of the 
formation of cooperation in the course of biological as well as cultural evolution 
[2,3]. 

A naive interpretation of Darwin's theory might suggest evolution favoring 
nothing but direct battle and plain competition. However numerous observa- 
tions of cooperation in the animal kingdom oppose this idea by plain evidence. 
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While such examples among animals are impressive in itself, clearly the most 
complex and complicated interplay of cooperation and competition occurs with 
humans; a fact which becomes most obvious when a large number of humans 
gathers as a crowd in spatial proximity. There are astonishing and well-known 
examples for both: altruism among strangers under dangerous external condi- 
tions [4-11] as well as fierce competition on goods with very limited material 
value often linked with a lack of information [12, 13] - and anything in between 
these two extremes; see for example the overviews in [14, 15]. In relation to these 
events - and possible similar events of pedestrian and evacuation dynamics [16] 
to come in the future - the wide-spread naive interpretation of the theory of 
evolution in a sense poses a danger, as it might give people in such situations 
the wrong idea of what their fellows surrounding them are going to do and by 
this in turn suggest overly competitive and dangerous behavior. Knowing of 
said historic events together with having an idea of theories that suggest why 
cooperation against immediate maximal self-benefit can be rational hopefully 
can immunize against such destructive thoughts and actions. 

From the beginning the prisoner's dilemma was investigated in an iterated 
way [17, 18], often including that the ability of strategies to hark back on course 
of events of the tournament [2, 19] was unlimited, i.e. they had a memory 
potentially including every own and opponents' steps. Despite the possibility 
of using more memory the first strategy emerging as winner - tit-for-tat - did 
with a memory of only the most recent action of the opponent. Another famous 
and successful strategy - pavlov - also makes only use of a small memory: it 
just needs to remember its own and the opponent's action. In this contribution 
the effect of an extension of the memory up to the three latest actions of the 
opponent and up to two latest own actions is investigated. 

In the course of discussion of the prisoner's dilemma a number of meth- 
ods have been introduced like probabilistic strategies to model errors ( "noise" ) 
[20] , evolutionary (ecologic) investigation [2] , spatial relations (players only play 
against spatially neighbored opponents) [21-30], and creation of strategies by 
genetic programming [3, 20, 31-33]. Most of these can be combined. For an 
overview on further variants see review works like [34, 35]. 

Contrary to these elaborate methods, a main guiding line in this work is to 
avoid arbitrary and probabilistic decisions like choosing a subset of strategies of a 
class or locating strategies spatially in neighborhoods; spatial variants as well as 
a genetic approach are excluded. Instead each strategy of the class participates 
and plays against each other. A consequence from investigating complete classes 
is that it is impossible to have a continuous element as constructing element of 
a strategy; this forbids probabilistic strategies. The round-robin mode as well 
- at least in parts - is a consequence of avoiding arbitrariness: drawing lots to 
choose pairs of competitors like in tournaments would bring in a probabilistic 
element. In other words: the source code written for this investigation does not 
at any point make use of random numbers. It is a deterministic brute force 
calculation of a large number of strategies and a very large number of single 
games. The relevance lies not in modeling a specific system of reality, but in 
the completeness of the investigated class and in general the small degree of 
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freedom (arbitrariness) of the system. 

By the strictness and generaHty of the procedure, a strategy can be seen as 
a Mealy automaton or the iterative game between two strategies as a Moore 
machine [36-39] respectively a spatially zero-dimensional cellular automaton 
[40, 41] (see section 3). 

2 Definition of a Strategy 

In the sense of this paper a strategy with a memory size n has n + 1 sub- 
strategies to define the action in the first, second, ... nth and any further 
iteration. The sub-strategy for the first iteration only decides, how a strategy 
starts the tournament, the sub-strategy for the second iteration depends on the 
action(s) of the first iteration, the sub-strategy for the third iteration depends 
on the actions in the first and second iteration (if memory size is larger one) 
and the sub-strategy for the {N > n)th iteration depends on the actions in the 
(A'' — n) to {N — l)st iteration (compare Figure 1). 

A similar approach has been followed in [42], but there are differences in 
the definition of the class concerning the behavior in the first n — 1 iterations 
and most important it has not been used for a round-robin tournament with all 
strategies of a class, but combined with a genetic approach. 

Another investigation dealing with effects of memory size is [43] . The differ- 
ence there is that the strategies are probabilistic and (therefore) not all strategies 
participate in the process. 

2.1 Data Size of a Strategy, Number of Strategies, and 
Number of Games 

Since at the beginning there is no information from the opponent a strategy 
consists of a decision, how to begin an iterated game (one bit). In the second 
round, there is only information on one past step from the opponent, so the 
strategy includes the decision how to react on this (two bits), the third step is 
still part of the starting phase and therefore also has its own part of the strategy 
(four bits, if the decision does not depend on a strategy's own preceding action). 
Therefore there are 128 strategies if there is a no-own-two-opponent memory. 
Finally with size-three memory, one has eight more bits. As an example in 
Figure 1 it is shown, how one calculates the number combination (1/2/12/240) 
from the tit-for-tat strategy. These 15 bits lead to a total of = 32768 different 
strategies. If each strategy plays against each other strategy and against itself 
one has to calculate N ■ {N -I- l)/2 = 2^^ different iterated prisoner's dilemmas. 

Table 1 sums up these numbers for different memorysizes. To remember the 
last n actions of a pair of strategies, one needs 2n bits and for the results of 
a strategy over the course of iterations one needs - depending on the kind of 
evaluation - a few bytes for each pair of strategies. The number of pairs of 
strategies - and this is the limiting component - grows at least approximately 
like 2^" -3. On today's common PCs RAM demands are therefore trivial up 
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to a memory size of n = 2, in the lower range of 64-bit technology (some GBs 
of RAM) for n = 3, and totally unavailable for n = 4 and larger (more than an 
exabyte) . 
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Figure 1: Tit-for-tat as strategy (1/2/12/240). The part (1/2/12) applies only 
in the starting phase, when only no, one or two earlier states of the opponent 
exist. So, cooperation is coded with a "1", defection with a "0". If a strategy 
remembers also its own past actions then these are always stored in the lower 
bits, i.e. for example of the triples, the leftmost would indicate a strategy's own 
preceding action and the middle and right would indicate the second to last and 
last action of the opponent ( "low to high" is "left to right" ) . 



3 The Cellular Automata Perspective 

This section serves to have another perspective at the system in terms of cellular 
automata. This can help to get a visual idea of the system dynamics. However, 
the reader may well skip this and proceed to the next section. 

Wolfram's elementary cellular automata are defined (or interpreted) to exist 
in one spatial plus one temporal dimension. However, one can also apply the 
rules to a point-like cellular automaton with memory. Figure 2 shows an example 
for this. One can also interpret this system not as a cellular automaton that 
has a memory and a binary state, but as an automaton that can have one of 
eight states with transitions between the states being restricted. For the full 
set of 256 rules each state can be reached in principle from two other states 
and from a particular state also two states can be reached. Choosing a specific 
rule is selecting one incoming and one outgoing state. This is exemplified in 
Figure 3 for rule 110. For the iterated prisoner's dilemma one needs two such 
cellular automata that interact that determine their next state from the data of 
the other automaton as shown in Figure 4. It is of course possible to interpret 
two interacting cellular automata as one single point-like cellular automaton 
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Memory size #Bits ^Strategies #Games in one iteration 
self /other 



0/0 


1 


2 


1 resp. 3 


0/1 


3 


8 


28 resp. 36 


1/1 


5 


32 


496 resp. 528 


0/2 


7 


128 


8,128 resp. 8,256 


1/2 


13 


8,192 


sa 33.55 • 10*^ 


2/1 


13 


8,192 


« 33.55 • 10^ 


0/3 


15 


32,768 


w 536.8 • 10^ 


2/2 


21 


2,097,152 


ss 2.199- 10^ 


1/3 


29 


536,870,912 


« 144.1 • 10^5 


3/1 


29 


536,870,912 


« 144.1 • 10^5 


0/4 


31 


2,147,483,648 


« 2.306 • 10^^ 



Table 1: Number of bits (b) to represent a strategy, number of strategies (2^), 
and number of prisoner's dilemma games in an iteration step in a round-robin 
tournament (2''^^ (2'' ± 1)) for different memory sizes. This leads to a computa- 
tional effort shown in Table 2. 



Memory size 
self /other 


RAM 


Time 


0/0 


10 B 


insignificant 


0/1 


100 B 


insignificant 


1/1 


10 KB 


s .. min 


0/2 


100 KB 


s .. min 


1/2 


100 MB 


min .. d 


2/1 


100 MB 


min .. d 


0/3 


10 GB 


h .. weeks 


2/2 


10 TB 


d .. year 


1/3 


1 EB 


> year 


3/1 


1 EB 


> year 


0/4 


10 EB 


decade(s) (?) 



Table 2: Magnitudes of computational resource requirements (on a double quad 
core Intel Xeon 5320). The computation time depends significantly on the 
number of different payoff matrices to be investigated. Large scale simulations 
with parallel computing of the iterated prisoner's dilemma has also been dealt 
with in [44]. 



5 



I — I — I — 1 
I — I — I — 3, I — I — I — 3. 

I I I T I I I T 

I I I T 



time r 

Figure 2: Rule 110 applied self-referentially to a point-like cellular automaton 
with memory. Note: as time increases toward the right and the most recent 
state is meant to be stored in the highest bit, but higher bits are notated to the 
left, one has to reverse the bits compared to Wolfram's standard notation. 





C(2) 


D(2) 


C(l) 


R R 


S T 


D(l) 


T S 


P P 



Table 3: General payoff matrix 



with a larger set of states. Then Figure 4 would translate to Figure 5. One 
now again could draw a transition graph (with 64 nodes that all have one of 
four possible incoming and outgoing links or a specific combination of rules) for 
further theoretical analysis. For this work we shall now abandon these basic and 
theoretical considerations and just adhere to the fact that the implementation 
of the process can be seen as a cellular automaton, more precisely an enormous 
number of combination of interacting very simple cellular automata. 

4 Payoff Matrix 

The four values T, i?, P, and S of the payoff matrix (see Table 3) need to fulfill 
the relation 

T> R> P> S (1) 

to be faced with a prisoner's dilemma. For the purpose of this contribution one 
can choose S = without loss of generality, as whenever the payoff matrix is 
applied all strategies have played the same number of games. In addition to 
equation 1 it is often postulated that 

2R>T (2) 

holds. 

The equation 

T + S=^ P + R (3) 
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Figure 3: Transition graph for rule 110 (black links) and possible links or other 
rules (grey links). 
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rule 184 




rule 110 



Figure 4: Rule 184 and rule 110 interacting. As a model for the iterated pris- 
oner's dilemma the dependence here models the situation that a prisoner re- 
members the three preceding moves of the opponent but none of its own. 



Figure 5: Figure 4 as one single cellular automaton. If the states of both 
automata are white (black) the state here is shown as well as white (black). If 
184 is white (black) and 110 black (white), the state here is yellow (red). 
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as well marks a special set of payoff matrices, as those values can be seen as 
model of a trading process, where the exchanged good has a higher value for 
the buyer i than the seller j: 

Pij ^ a + f3S^- -fSj (4) 

where 5 — 1, if a player cooperates and 5 = if he defects. Therefore j3 can 
directly be interpreted as the "gain from receiving" value and 7 the "cost from 
giving" value, a is a constant to guarantee Pij > 0. T, R, P for technical 
convenience, and S can be calculated from these: T ^ a + f3, i? = a + /3 — 7, 
P = a, and S — a — ^. Aside from the descriptive interpretation as "gain from 
receive" and "cost to give" this reparametrization has the advantage that the 
original condition equation (1) and the additional conditions equation (2) and 
5* = reduce to /3 > 7 = a. Furthermore it is the form of the basic equation in 
George Price's model for the evolution of cooperation [45, 46]. 

As we do not only want to investigate payoff matrices, where equations (2) 
and (3) hold, we rewrite 

T = (l + a + 5)P (5) 
R = {l + a)P (6) 
a = a/P-l>0 (7) 
b = /3/P>0 (8) 

In principle one could set P = 1 without loss of generality, but then it was not 
possible to write all combinations holds/does not hold of equations (2) and (3) 
with integer-valued T and R. Now equation (3) simply can be written as 

6=1 (9) 

and shall be investigated as one variant next to 6 > 1 and b < 1. And equation 
(2) writes 

a + l>h. (10) 

Here as well a + 1 = 6 and a + 1 < 6 will be investigated (always taking care 
that a > and & > hold). Finally, a(<, =, >)1 and a(<, =, >)b are relevant 
conditions, if it's possible to distinguish in this way. 

Obviously not all combinations of these conditions can hold simultaneously. 
(a+1 < 6, 6 < 1) for example has no allowed solution. The allowed combinations 
and the values for T, R, and P are shown in Table 4. For each combination 
of conditions an infinite number of values could have been found. One could 
have chosen to interpret ">" as "much greater than" but then selecting specific 
numbers in a way would have been arbitrary. So the smallest numbers to fulfill 
a set of conditions have been chosen as representatives. 

5 Iteration, Tournament, and Scoring 

In an iteration step all strategies play a prisoner's dilemma against any of the 
other strategies and themselves. For this a strategy calculates its action from 
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Cond. 1 


Cond. 


2 


Cond. 3 


T 


R 


P 


T = R + P 


2R>T 


6=1 


a = 1 






3 


2 


1 


holds 


holds 


6=1 


a > 1 






4 


3 


1 


liolds 


holds 


6=1 


a < 1 






5 


3 


2 


holds 


holds 


6 < 1 


a = l 






5 


4 


2 




holds 


6 < 1 


a > 1 






6 


5 


2 




holds 


6 < 1 


a < 1 




b = a 


4 


3 


2 




holds 


6 < 1 


a <1 




b> a 


6 


4 


3 




holds 


6 < 1 


a <1 




b < a 


6 


5 


3 




holds 


6 > 1 


b < a + 


1 


a> 1 


5 


3 


1 




holds 


6 > 1 


b < a + 


1 


a = l 


7 


4 


2 




holds 


6 > 1 


b < a + 


1 


a < 1 


9 


5 


3 




holds 


6 > 1 


b = a + 


1 


a = l 


4 


2 


1 






6 > 1 


b> a + 


1 


a = l 


5 


2 


1 






6> 1 


b = a + 


1 


a> 1 


6 


3 


1 






6> 1 


b> a + 


1 


a> 1 


7 


3 


1 






6> 1 


b = a + 


1 


a <1 


6 


3 


2 






6> 1 


b> a + 


1 


a <1 


7 


3 


2 







Table 4: Investigated variants of values for the payoff matrix. 



the preceding actions of the specific opponent. If N-j, N[j, Nfj, N^j are the 
counters, how often strategy i received a T , R, P or S payoff playing against a 
specific strategy j, in each iteration step for each i and each j one of the four 
N^j is increased by one. 

Now all the payoff matrices from Table 4 are applied one after the other to 
calculate for each payoff matrix for each strategy i the total payoff Gj : 

Gl = J2 T^Ij + + P^^o (11) 



The strategy (or set of strategies) i yielding the highest G\ is one of the 
main results for a specific iteration round and a specific payoff matrix. 

Then the tournament is started. Each tournament round g is started by 
calculating the average payoff of the preceding tournament round: 

G^ = (12) 

where <5f = 1, if strategy i was still participating in the tournament in tourna- 
ment round g and Sf = else. Then Sf^^ is set to 0, if 6f = 0, or if a strategy 
scored below average: 

Gf < (13) 
The payoff for the next tournament round g + 1 is calculated then for all strate- 
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gies still participating in the tournament: 

Gf+i = ^(TTV^ + i?iV« + PNr^)Sf' (14) 
j 

The tournament ends, if only one strategy remains or if all remaining strategies 
score equal in a tournament round (i.e. they have identical Gf ). The strategies, 
which manage to emerge as winners of such a tournament are the second main 
result for a specific iteration step and a specific payoff matrix. 

Such an elimination tournament can be interpreted as an evolutionary tour- 
nament, where the frequency values for the strategies can only take the values 
/ = and / 1. 

To state it explicitly: all strategies participate again in the next iteration 
step for another first round of the tournament. The elimination process only 
takes place within an iteration step and not across iteration steps, and there is 
no prisoner's dilemma game played in or between the rounds of a tournament. 
As all strategies are deterministic this procedure is equivalent to playing the 
prisoner's dilemma a fixed number of iterations, evaluate the scores, eliminate 
all strategies scoring below average and play again the fixed number of iterations 
with the remaining strategies, and so on. 

6 Results 

In this section for all payoff matrices of Table 4 the strategies are given that for 
large numbers of iteration steps have the highest payoff G] in the first round of 
the tournament and those strategies that win the tournament - if the system 
stabilizes to one winner. Additionally the iteration round is given, when this 
winning strategy (strategies) appeared for the first time to stay continuously 
until the last calculated iteration. This implies that for a certain payoff matrix 
prior to this iteration the number of iterations is important for the question 
which strategy will emerge as the best (in the sense described in section 5). 

6.1 Results for No-Own-One-Opponent Memory 

With only one action to remember, there are just 8 strategies (named (0/0) to 
(1/3)). (0/0) never cooperates, (1/3) always. TFT is (1/2). 1000 iteration steps 
were done. It's safe to say that this is sufficiently long, as the results - shown 
in tables 5 and 6 - stabilize at latest in iteration step 16 (respectively 179). 

6.2 Results for One-Own-One-Opponent Memory 

With this configuration beginning with the second iteration step strategies base 
their decision on two bits, one (the higher bit) in which is encoded the action 
of their opponent and one in which is remembered their own action. For an 
overview in Table 7 numbers and behaviors are compared. 

For this and all further settings 10,000 iterations (and in special cases more) 
have been simulated. Results are shown in tables 8 and 9. 
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T 


i? 


P 


First it. 




Tournament 


3 


2 


1 


8 


(0/0) 


(1/2) 


4 


3 


1 


4 


(0/0) 


(1/2) 


5 


3 


2 


16 


(0/0) 


(1/2) 


5 


4 


2 


6 


(0/0) 


(1/2) 


6 


5 


2 


4 


(0/0) 


(1/2) 


4 


3 


2 


10 


(0/0) 


(1/2) 


6 


4 


3 


14 


(0/0) 


(1/2) 


6 


5 


3 


6 


(0/0) 


(1/2) 


5 


3 


1 


4 


(0/0) 


(0/0) 


7 


4 


2 


4 


(0/0) 


(0/0) 


9 


5 


3 


4 


(0/0) 


(0/0) 


4 


2 


1 


4 


(0/0) 


(0/0) 


5 


2 


1 


4 


(0/0) 


(0/0) 


6 


3 


1 


4 


(0/0) 


(0/0) 


7 


3 


1 


4 


(0/0) 


(0/0) 


6 


3 


2 


4 


(0/0) 


(0/0) 


7 


3 


2 


4 


(0/0) 


(0/0) 



Tabic 5; Results for (no own / one opponent) memory, if strategies also play 
against themselves. "First it." denotes the iteration round, at which the results 
remain the same until iteration round 1000. TFT wins the tournament, if 6 < 1 
(regardless of a) , while a comparison of the whole set of strategies is always won 
by ALLD (defect always). 
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T 


i? 


P 


First it. 




Tournament 


3 


2 


1 


8 


(0/0) 


(0/0) 


4 


3 


1 


8 


(0/0) 


(0/0) 


5 


3 


2 


12 


(0/0) 


(0/0) 


5 


4 


2 


162 (2) 


(0/0) 


(0/2), (1/2) 


6 


5 


2 


179 (2) 


(0/0) 


(0/2), (1/2) 


4 


3 


2 


108 (2) 


(0/0) 


(0/0), (0/2) 


6 


4 


3 


168 (2) 


(0/0) 


(0/0), (0/2) 


6 


5 


3 


80 (2) 


(0/0) 


(0/0), (0/2) 


5 


3 


1 


4 


(0/0) 


(0/0) 


7 


4 


2 


7 


(0/0) 


(0/0) 


9 


5 


3 


8 


(0/0) 


(0/0) 


4 


2 


1 


4 


(0/0) 


(0/0) 


5 


2 


1 


4 


(0/0) 


(0/0) 


6 


3 


1 


4 


(0/0) 


(0/0) 


7 


3 


1 


4 


(0/0) 


(0/0) 


6 


3 


2 


8 


(0/0) 


(0/0) 


7 


3 


2 


4 


(0/0) 


(0/0) 



Table 6: Results for (no own / one opponent) memory, if strategies do not play 
against themselves. Numbers in brackets in column "First it." denote period 
length, if results oscillate. Entries marked in italics each second iteration do not 
co-win the tournament, if the results alternate. This setting is much less prone 
to lead to cooperation than if strategies also do play against themselves. 



numbers for strategies 


latest own 


latest opponent 


(?/l) 


D 


D 


(?/2) 


C 


D 


(?/4) 


D 


C 


(?/8) 


C 


C 



Table 7: A strategy cooperates, if its number is composed of the elements of 
this table. TFT for example is (1/12) (cooperate, if line 3 or line four of this 
table is remembered: (1/4+8)). 
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T 


i? 


P 


First it. 


G] 


Tournament 


3 


2 


1 


8 


set of 4 


(1/8), (1/12) 


4 


3 


1 


66 


(1/8) 


(1/8), (1/9), (1/12), (1/13) 


5 


3 


2 


18 


set of 4 


(1/8), (1/12) 


5 


4 


2 


21 


(1/8) 


(1/8), (1/12) 


6 


5 


2 


18 


(1/8) 


(1/8), (1/9), (1/12), (1/13) 


4 


3 


2 


12 


set of 4 


(1/8), (1/12) 


6 


4 


3 


21 


set of 4 


(1/8), (1/12) 


6 


5 


3 


27 


(1/8) 


(1/8), (1/12) 


5 


3 


1 


8 


set of 4 


(1/8), (1/12) 


7 


4 


2 


15 


set of 4 


(1/8), (1/12) 


9 


5 


3 


18 


set of 4 


(1/8), (1/12) 


4 


2 


1 


1398 


set of 4 


(1/12) 


5 


2 


1 


10 


set of 4 


(1/8) 


6 


3 


1 


30 


set of 4 


(1/8), (1/12) 


7 


3 


1 


6 


set of 4 


(1/8) 


6 


3 


2 


645 


set of 4 


(1/12), (1/8) 


7 


3 


2 


15 


set of 4 


(1/8) 



Tabic 8: Results for (one own / one opponent) memory, if strategies also play 
against themselves, "set of 4" consists of four strategies: (0/0), (0/2), (0/8), 
(0/10). All strategies that win the tournament cooperate in the first iteration 
and at least continue to cooperate upon mutual cooperation (1/ > 8). (1/12) 
(TFT) is not among the winners, if 6 > a + 1. (?/9) is the strategy that 
sticks with its behavior, if the opponent has cooperated and else changes it, 
i.e. it is "pavlov". (1/8) can also be seen as a pavlovian strategy, but a more 
content one than (1/9) - happy with anything than S and thus repeating the 
previous behavior except if having received S. No rule is among the winners, 
that continues cooperation, if the opponent has defected. (Strategy (0/2) would 
do so, but it never can reach the state that it cooperates.) 
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set of 4 


set of 4, (O/J^) 
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set of 4 


set of 4 altern. ((0/4), (1/4)) 



Table 9: Results for (one own / one opponent) memory, if strategies do not play 
against themselves, "set of 4" consists of four strategies: (0/0), (0/2), (0/8), 
(0/10). 

6.3 Results for No-Own-Two-Opponent Memory 

Now 10,000 iteration steps were carried out. Again this is far more than the 
largest number of iterations before the process settles down in some way. Now 
TFT is (1/2/12) and TF2T is (1/3/14). Results are shown in tables 10 and 11. 

6.4 Results for One-Own-Two-Opponent Memory 

In this case, one could in principle reduce the size of the strategy, as it makes 
no sense to distinguish between strategies that cooperate or defect in the second 
iteration, if hypothetically they cooperate in the first iteration, when in fact they 
defect in the first iteration. For the simulation the number of strategies has not 
been reduced to the subset of distinguishable ones, as this would have been a 
source of error for the source code, and at this stage, the effect on required 
resources for computation is negligible. Thus for each strategy there are three 
more that yield exactly the same results against each of the strategies. In the 
table of results (table 12) just one of the four equivalent strategies is given - the 
one with the smallest number. This means that in case of initial defection adding 
2, 8, or 10 to the middle number gives the equivalent strategies and in case of 
initial cooperation, it is 1, 4, or 5. Therefore TFT is (1/8/240), (1/9/240), 
(1/12/240), and/or (1/13/240). Even when the results are reduced by naming 
only one of four strategies linked in this way, this is the first configuration, where 
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(1/2/2) 


(1/0/2) 
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395 (4) 
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41 (2) 
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(1/2/4) altern. (0/3/4) 
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127 (2) 


(0/0/2) 


(1/2/4) altern. (0/3/4) 



Table 10: Results for (no own / two opponent) memory, if strategies also 
play against themselves. For 6-3-1 (1/0/2) wins two iteration rounds and then 
(0/1/2) and then (0/1/2) and (0/3/2) win. For 7-3-1 it is similar, but (0/3/2) 
does never win. Compared to Table 5 TFT (1/2/12) (or even more cooperative 
strategies) mostly reappears, only disappears as winner of the tournament for 6- 
5-3, but newly wins 9-5-3. Thus, the general tendency that payoff matrices with 
6 < 1 produce more cooperation is kept, but softened. The most cooperative 
strategy co-winning a tournament is (1/3/14), which only defects, if it remem- 
bers two defections of the opponent. Overall - compared to the settings with 
smaller memory - the dominance of "always defect" has vanished, especially in 
the first round of the tournament. 
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5 


3 


1 


359 


(1/2/2) 
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1644 (3) 


(0/0/2) 


(1/2/4), (0/2/4) 
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1 


2891 (2) 


(0/0/2) 


(0/2/4), ((1/2/4) alt (0/3/4)) 
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1 


13 (2) 


(0/0/2) 


(0/2/4), (0/3/4) 
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1 


515 (4) 


(1/2/2) 


(1/0/2) 
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731 
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(1/0/2) 
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85 (2) 
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(0/2/4), ((1/2/4) alt. (0/3/4)) 



Table 11: Results for (no own / two opponent) memory, if strategies do not 
play against themselves. For payoff 7-4-2 and 9-5-3 (0/2/4) co-wins in 2 out of 
3 rounds. The comparison to Table 6 reveals that increasing memory size makes 
cooperative strategies much more successful for almost all payoff matrices. None 
of the payoff matrices that produced oscillating results with size-one memory 
do so with size-two memory and vice versa. 
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the results are too complicated to be understandable at a glance. 

There are even more strategies that yield identical results in any combination 
with any other player: for all strategies that continue to defect (cooperate) on 
own defection (cooperation) those elements of the strategy that determine what 
to do, following an own cooperation (defection) are never applied and the value 
of these elements has no effect. This phenomenon leads to a large number of 
strategies winning the tournament. Interestingly for some of the payoff matrices 
the number of winners is smaller around 20 or 30 iterations than at larger 
numbers of iterations. 

For this memory configuration there is almost no difference in the results, 
if strategies play against themselves or not: the strategies with the most points 
in the first round of the tournament, and the number of strategies winning the 
tournament are the same in both cases. Only if the number of strategies winning 
the tournament is large, a small number of strategies might be exchanged and 
the iteration round, when the results are stable, is different. In iteration rounds 
before stability, there can be larger differences, however. We refrain from giving 
a result table for the case when strategies do not play against themselves. 

6.5 Results for Two-Own-One-Opponent Memory 

This configuration is interesting as one can interpret a strategy considering 
a remembered opponent's action as reaction to an as well remembered own 
action. While TFT is (1/8/240), a strategy additionally cooperating in such 
a case would be (1/8/244). As Table 13 shows, sometimes only TFT appears 
among the winners of the tournament, sometimes both these strategies. Only 
with payoff matrix 6-5-2 the more forgiving strategy wins but not TFT. It is 
the more tricky strategy (1/8/228) that applies this kind of forgiveness, which 
is more successful than TFT. 

In this setting as well, it has only minor effects if a strategy plays against 
itself or not. Therefore the results for the case when they do not is omitted. 

6.6 Results for No-Own-Three-Opponent Memory 

Regarding the number of strategies this setting is the largest investigated in this 
work. The number of iterations until the results settle varies greatly among the 
various payoff matrices. In fact for some payoff matrices they did not stabi- 
lize before the 30,000th iteration. At this point we refrained from performing 
further calculations and accepted the (non-)result as open issue for future inves- 
tigations. However, even for payoff matrices with which stable results appear 
to have been reached it cannot be excluded that after some 10,000 iterations 
more different winners would result, as in the more volatile cases. Another sur- 
prising observation was that the results sometimes appeared to have reached 
a final state but then started changing again. After all, for remembering one 
opponent's action, stable results appeared after approximately 10 iterations, 
for remembering two opponents' moves it was about 1,000 iterations. So, it is 
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(0/5/176V244) 


5 


2 


1 


278 


set of 4 
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7 
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set of 4 


(0/1/180) 
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set of 4 


(0/1V5/180V244), 












(0/5/176V244) 
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422 


set of 4 


(0/1/180) 



Table 12: Results for (one own / two opponent) memory, if strategies also 
play against themselves. The "V" is used as the common meaning of "or". 
(1/10/160) cooperates in the first and second iteration and then continues to 
cooperate, if both strategies have cooperated, else it defects. This implies 
that it does not make use of the second to last iteration and is thus sim- 
pler than possible. Except for the definite cooperation in the second itera- 
tion, it is strategy (1/8) from the (one / one) setting, "set of 4" consists of 
(0/0/1V9V129V137), which all do make use of the information on the opponent's 
second to last action, "set of 22" is (1/8V10/176V180V208V212V240V244), 
(1/8/144V146V148V150V178V182V210V214V242V246) and by this includes 
TFT. "set of 13" is (1/10/148), (1/8V10/132V140V164V196V204V228). 
"set of 17" includes "set of 13", (1/8/168V172V232), and (1/10/144). 
"set of 30" contains "set of 13", (1/8V/10/128V136V160V192V200V224), 
(1/8/130V162V194V226), and (1/10/144). "set of 37" consists of "set of 
30", (1/8V10/168V172V232), and (1/10/236). The remaining four sets 
("set of 20", "set of 39", "set of 25", and "set of 29") share in common 
(1/10/168V172V184V188V204V232V236V248V252), which includes TF2T. A to- 
tal of 41 further strategies appear as members of these sets, of which a majority 
(28) have not appeared earlier in this table and table's caption. 
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Table 13: Results for (two own / one opponent) memory, if strategies also 
play against themselves. For the payoff matrices from the top down to 5-3-1 
strategy (1/8/228) always is among the winners of the tournament. It is the 
strategy that almost plays tit for tat, but does not cooperate, if the opponent 
has cooperated and itself has defected two times, but does cooperate, if the 
opponent has defected after itself has defected, even, if itself has cooperated in 
the most recent game. For the winner strategy (0/1/4) of the first round of 
the tournament, this history is even the only case when it cooperates. For the 
payoff matrices 5-3-2 and 7-4-2 20,000 iterations were calculated to verify the 
late stability, respectively period 4. 
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not unrealistic to assume that remembering three opponents' actions may need 
100,000 or even more iterations until the results do not change anymore. 

Further difficulties may arise from precision issues in the calculation. During 
the tournament it is decided by comparison with the average of points, if the 
strategies may participate in the next round. The average is calculated by 
dividing one very large number by another very large number. As a consequence 
the size comparison between average and individual result may be faulty, if in 
fact a strategy has exactly achieved the average of points and by this kicked out 
of the tournament. Another resource problem is the possibility that the sum of 
points produces an overflow in the corresponding integer variable. That such 
considerations could be relevant when dealing with such large numbers is based 
on general experience with complex simulations; in the results there was no 
explicit hint that such issues really occured, except for that the long instability 
of results that appeared to be surprising in principle could be attributed to 
them. Ruling them out would need a second computer system with different 
hardware architecture or a very thorough understanding of the CPU and the 
compiler that were used. None of these were sufficiently available. Additionally 
one has to consider that each simulation run currently takes days to arrive at the 
number of iterations where these issues could be relevant. In a nutshell: using 
up-to-date standard computer systems the no-own-three-opponent-memory case 
today is at the edge of what is accessible. Definately ruling out negative effects 
that falsify the results and doing this with maintainable effort remains to be 
done in the future. 

As calculating the payoff and evaluating the tournament takes more compu- 
tation time than calculating the results of the dilemma, beyond 10,000 iterations 
only for the last one hundred iterations ahead of the full thousands payoff and 
tournament were calculated. This in turn implies that the iteration number after 
which the results did not change anymore can only be given approximately. 

Having said all this, it becomes obvious that the results of this section need 
to be considered as preliminary - the more the later the assumed stability was 
observed. 

A different problem is that in some cases the number of winners of the tour- 
nament is too large to give all of the winning strategies in this paper. However, 
the remaining cases should be sufficient to demonstrate the type and especially 
variants of strategies winning the tournament 

A majority of the strategies winning the first round of the tournament 
cooperate, when the earliest opponent's action they remember was coopera- 
tion and any other defection. This is a trend which was already present with 
the one element smaller memory, but it was not as pronounced. This strategy 
is interesting in a sense as it uses the last chance to avoid breaking entirely 
with the opponent. To find a catchy name for this strategy, recall Mephisto's 
behavior toward God in the Prologue in Heaven of Faust I: "The ancient one I 
like sometimes to see, And not to break with him am always civil"^ , where even 

^The German original "Von Zeit zu Zeit seh ich den Alten gern, und hiite mich mit ihm 
zu brechen." even more stresses tiie occasional character of the cooperative interaction. 
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considering all the competition between the two, Mephisto avoids entirely aban- 
doning cooperation. If one extrapolates Mephisto to even larger memory sizes, 
cooperation vanishes more and more, although there is some basic cooperative 
tendency kept in the strategy. There are two questions: if this trend would 
actually continue infinitely, when memory size is increased further, and what it 
means that for example the case of one-own-two-opponent-memory-size yields 
strategies as winners of the first round of the tournament that have entirely 
different characteristics. 

The results are shown in table 14. 

7 Summary and Outlook 

The calculations of this work reveal a strong dependence of the results of the 
tournament on the details of the payoff matrix. It is not sufficient to distinguish, 
iiT + S = R + P and 2R > T + S hold or not. This means that one has to be 
careful drawing conclusions, if the prisoner's dilemma is used as a toy model for 
some real system. Of course, as this work restricted strategies to limited memory 
size, there might be strategies relying on infinite memory that outperform all 
of these regardless of the details of the payoff matrix. So, the main result of 
this work is not that everything changes with a different payoff matrix, but that 
one should not be too faithful that the precise choice of the payoff matrix is 
irrelevant. 

As expected the two basic relations T + S = R + P and 2R > T + S clearly 
have an influence on the results, as subsets of strategies appear among the 
winners in tendency depending if these relations hold or if they do not so. The 
picture is a bit different for the winner of the first round of the tournament, when 
all strategies still participate: there are fewer strategies appearing as winners, 
but if there is more than one for a memory configuration, there is no obvious 
pattern based on these relations that tells which strategy wins if a specific payoff 
matrix is applied. In total, one cannot claim that the details of the payoff matrix 
will dominate each element of the results in any case. However, in general one 
can say that the results do depend on the specific choice of the payoff matrix. 
Furthermore it is not only not possible to find one generally best or a set of 
generally best strategies, but - if one compares the winners of the first round 
of the tournament and the tournament as a whole - even for a specific payoff 
matrix it cannot be decided in general, if cooperating is a good or bad idea, as 
this depends on the kind of result that decides about the winner. 

While for these reasons, it is usually not possible to use the prisoner's 
dilemma as some kind of proof that in some real system cooperating yields 
best payoff, the results of this work - as of a lot of preceding works - helps to 
bear in remembrance that cooperating might be the better idea, even if at first 
sight one might have the opposite impression. The iterated prisoner's dilemma 
obviously is an abstract and simplified model for any real social system and 
the four entries of the payoff matrix often are not set quantitatively by the real 
system. In such cases conclusions drawn from calculations can only be valid if 
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(1/3/8V9/226) alt. (0/3/13V15/226) 


7 


3 


1 


^ 9,000 


(1/2/10/2) 


(0/3/13/226) 


6 


3 


2 


^ 9,000 (2) 


(0/0/0/2) 


(0/3/15/226) alt. 












(1/3/8V9/226), (1/3/9/240) 


7 


3 


2 


1229 


(0/0/0/2) 


(0/3/13/226) 



Table 14: Results for remembering three preceding opponents' actions. (Strate- 
gies do play against themselves.) For 5-4-2-0 after a varying number of iterations 
(roughly 10) another result with 14 tournament winning strategies appears. 
These do not include the 6 given here 
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the results do not significantly depend on details of the payoff matrix. 

In some cases the results stabilized only after a very large number of iter- 
ations, a number far larger than for example the number of iterations in the 
tournaments performed by Axelrod [2]. This does not necessarily mean that it 
is useless to investigate cases with fewer iterations, as also before the results 
stabilize, the results oscillate between two sets or between a set and a proper 
subset. As the number of iterations for stability grows with the number of 
participating strategies and as the number of participating strategies is already 
quite large in cases, when stability only occurs beyond 1000 iterations, one can 
assume that for most investigations of the iterated prisoner's dilemma that have 
been published so far, the number of iterations was sufficiently high. Still, the 
results of this work indicate that an investigation of the effect of having ±20 
iterations usually should be worth the effort. 

The results show a tendency that for increased memory size somewhat 
cooperative strategies score better. There have been investigations on the de- 
pendency of good memory and scoring in an iterated prisoner's dilemma [47, 48], 
however, the facing work is rather indifferent on this issue. With memory size 
also the number of strategies increases and cooperative strategies find more 
strategies that cooperate as well. A comparison of Tables 5 and 6 supports this 
idea, as it shows how it benefits cooperative strategies, when there is one more 
cooperative counterpart (themselves) participating in the tournament. The fact 
that with increasing memory size in the end it does not play any further role, 
if strategies play themselves or not, shows that in these cases the strategies are 
related to some of the others, in a way that in effect playing against them is as 
playing against themselves. On the other hand, if a good memory would not 
matter then there should be more strategies among the winners that do not 
make use of principally available more past information. 

In this work the results have mainly been presented and - despite the consid- 
erable extent of the paper - only scarcely been analyzed and discussed. There 
are plenty of possibilities to discuss the success or poor performance of a spe- 
cific strategy in a specific memory configuration with a specific payoff matrix in 
analytical terms. For settings that yield large sets of tournament winners, the 
results can be investigated statistically. Once stronger computational resources 
are available larger memories can be investigated and the case of remembering 
three opponent's actions can be investigated more reliable. 

In this work the idea was to simulate as many rounds as are necessary to 
yield stable results. The development of the results over the rounds was not 
and thus could be investigated in further studies. 

For the tournament itself one can think of many variants. One could for ex- 
ample only eliminate those strategies scoring worst in an iteration, or eliminate 
always (as far as possible) exactly half of the strategies still running. It is also 
possible to allow initial population weights different than one. 

And finally the role of the payoff matrix can be investigated in greater depth. 
In this work no two payoff matrices always gave the same result (although the 
results of 7-4-2 and 9-5-3 were always at least similar). Is it possible at all that 
two payoff matrices that are not related trivially yield the same results? And 
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if this is the case, what is (if it exists) the simplest parametrization and set of 
relations between the parameters, which generates all payoff matrices that yield 
all possible results? Can the winning strategies or the number of iterations until 
stability be derived analytically? 

The differences between the results with different payoff matrices might as 
well reduce, if the tournament were not carried out in a binary way, but if 
the frequency of a strategy could take a real value and frequencies of a round 
were dependent on the score (fitness) of the preceding round. It would then be 
possible for a strategy to score below average for example in the first round, but 
recover in subsequent rounds. 
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